NRC Russian-English Machine Translation System for WMT 2016
نویسندگان
چکیده
We describe the statistical machine translation system developed at the National Research Council of Canada (NRC) for the Russian-English news translation task of the First Conference on Machine Translation (WMT 2016). Our submission is a phrase-based SMT system that tackles the morphological complexity of Russian through comprehensive use of lemmatization. The core of our lemmatization strategy is to use different views of Russian for different SMT components: word alignment and bilingual neural network language models use lemmas, while sparse features and reordering models use fully inflected forms. Some components, such as the phrase table, use both views of the source. Russian words that remain out-ofvocabulary (OOV) after lemmatization are transliterated into English using a statistical model trained on examples mined from the parallel training corpus. The NRC Russian-English MT system achieved the highest uncased BLEU and the lowest TER scores among the eight participants in WMT 2016.
منابع مشابه
NRC Machine Translation System for WMT 2017
We describe the machine translation systems developed at the National Research Council of Canada (NRC) for the RussianEnglish and Chinese-English news translation tasks of the Second Conference on Machine Translation (WMT 2017). We conducted several experiments to explore the best baseline settings for neural machine translation (NMT). In the RussianEnglish task, to our surprise, our bestperfor...
متن کاملEdinburgh Neural Machine Translation Systems for WMT 16
We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English↔Czech, English↔German, English↔Romanian and English↔Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with usin...
متن کاملOmnifluent English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed the most to high translation quality were training data sub-sampling methods, document-specific ...
متن کاملCUni Multilingual Matrix in the WMT 2013 Shared Task
We describe our experiments with phrase-based machine translation for the WMT 2013 Shared Task. We trained one system for 18 translation directions between English or Czech on one side and English, Czech, German, Spanish, French or Russian on the other side. We describe a set of results with different training data sizes and subsets. For the pairs containing Russian, we describe a set of indepe...
متن کاملFactored Machine Translation Systems for Russian-English
We describe the LIA machine translation systems for the Russian-English and English-Russian translation tasks. Various factored translation systems were built using MOSES to take into account the morphological complexity of Russian and we experimented with the romanization of untranslated Russian words.
متن کامل